290 research outputs found
A Dynamic Algorithm for Network Propagation
Network propagation is a powerful transformation that amplifies signal-to-noise ratio in biological and other data. To date, most of its applications in the biological domain employed standard techniques for its computation that require O(m) time for a network with n vertices and m edges. When applied in a dynamic setting where the network is constantly modified, the cost of these computations becomes prohibitive. Here we study, for the first time in the biological context, the complexity of dynamic algorithms for network propagation. We develop a vertex decremental algorithm that is motivated by various biological applications and can maintain propagation scores over general weights at an amortized cost of O(m/(n^{1/4})) per update. In application to real networks, the dynamic algorithm achieves significant, 50- to 100-fold, speedups over conventional static methods for network propagation, demonstrating its great potential in practice
Estimating population size via line graph reconstruction
Background: We propose a novel graph theoretic method to estimate haplotype population size from genotype data. The method considers only the potential sharing of haplotypes between individuals and is based on transforming the graph of potential haplotype sharing into a line graph using a minimum number of edge and vertex deletions. Results: We show that the resulting line graph deletion problems are NP complete and provide exact integer programming solutions for them. We test our approach using extensive simulations of multiple population evolution and genotypes sampling scenarios. Our results also indicate that the method may be useful in comparing populations and it may be used as a first step in a method for haplotype phasing. Conclusions: Our computational experiments show that when most of the sharings are true sharings the problem can be solved very fast and the estimated size is very close to the true size; when many of the potential sharings do not stem from true haplotype sharing, our method gives reasonable lower bounds on the underlying number of haplotypes. In comparison, a naive approach of phasing the input genotypes provides trivial upper bounds of twice the number of genotypes
Gene loss rate: a probabilistic measure for the conservation of eukaryotic genes
The rate of conservation of a gene in evolution is believed to be correlated with its biological importance. Recent studies have devised various conservation measures for genes and have shown that they are correlated with several biological characteristics of functional importance. Specifically, the state-of-the-art propensity for gene loss (PGL) measure was shown to be strongly correlated with gene essentiality and its number of protein–protein interactions (PPIs). The observed correlation between conservation and functional importance varies however between conservation measures, underscoring the need for accurate and general measures for the rate of gene conservation. Here we develop a novel maximum-likelihood approach to computing the rate in which a gene is lost in evolution, motivated by the same principles as those underlying PGL. However, in difference to PGL which considers only the most parsimonious ancestral states of the internal nodes of the phylogenetic tree relating the species, our approach weighs in a probabilistic manner all possible ancestral states, and includes the branch length information as part of the probabilistic model. In application to data of 16 eukaryotic genomes, our approach shows higher correlations with experimental data than PGL, including data on gene lethality, level of connectivity in a PPI network and coherence within functionally related genes
QPath: a method for querying pathways in a protein-protein interaction network
BACKGROUND: Sequence comparison is one of the most prominent tools in biological research, and is instrumental in studying gene function and evolution. The rapid development of high-throughput technologies for measuring protein interactions calls for extending this fundamental operation to the level of pathways in protein networks. RESULTS: We present a comprehensive framework for protein network searches using pathway queries. Given a linear query pathway and a network of interest, our algorithm, QPath, efficiently searches the network for homologous pathways, allowing both insertions and deletions of proteins in the identified pathways. Matched pathways are automatically scored according to their variation from the query pathway in terms of the protein insertions and deletions they employ, the sequence similarity of their constituent proteins to the query proteins, and the reliability of their constituent interactions. We applied QPath to systematically infer protein pathways in fly using an extensive collection of 271 putative pathways from yeast. QPath identified 69 conserved pathways whose members were both functionally enriched and coherently expressed. The resulting pathways tended to preserve the function of the original query pathways, allowing us to derive a first annotated map of conserved protein pathways in fly. CONCLUSION: Pathway homology searches using QPath provide a powerful approach for identifying biologically significant pathways and inferring their function. The growing amounts of protein interactions in public databases underscore the importance of our network querying framework for mining protein network data
BeWith: A Between-Within Method to Discover Relationships between Cancer Modules via Integrated Analysis of Mutual Exclusivity, Co-occurrence and Functional Interactions
The analysis of the mutational landscape of cancer, including mutual
exclusivity and co-occurrence of mutations, has been instrumental in studying
the disease. We hypothesized that exploring the interplay between
co-occurrence, mutual exclusivity, and functional interactions between genes
will further improve our understanding of the disease and help to uncover new
relations between cancer driving genes and pathways. To this end, we designed a
general framework, BeWith, for identifying modules with different combinations
of mutation and interaction patterns. We focused on three different settings of
the BeWith schema: (i) BeME-WithFun in which the relations between modules are
enriched with mutual exclusivity while genes within each module are
functionally related; (ii) BeME-WithCo which combines mutual exclusivity
between modules with co-occurrence within modules; and (iii) BeCo-WithMEFun
which ensures co-occurrence between modules while the within module relations
combine mutual exclusivity and functional interactions. We formulated the
BeWith framework using Integer Linear Programming (ILP), enabling us to find
optimally scoring sets of modules. Our results demonstrate the utility of
BeWith in providing novel information about mutational patterns, driver genes,
and pathways. In particular, BeME-WithFun helped identify functionally coherent
modules that might be relevant for cancer progression. In addition to finding
previously well-known drivers, the identified modules pointed to the importance
of the interaction between NCOR and NCOA3 in breast cancer. Additionally, an
application of the BeME-WithCo setting revealed that gene groups differ with
respect to their vulnerability to different mutagenic processes, and helped us
to uncover pairs of genes with potentially synergetic effects, including a
potential synergy between mutations in TP53 and metastasis related DCC gene
A direct comparison of protein interaction confidence assignment schemes
BACKGROUND: Recent technological advances have enabled high-throughput measurements of protein-protein interactions in the cell, producing large protein interaction networks for various species at an ever-growing pace. However, common technologies like yeast two-hybrid may experience high rates of false positive detection. To combat false positive discoveries, a number of different methods have been recently developed that associate confidence scores with protein interactions. Here, we perform a rigorous comparative analysis and performance assessment among these different methods. RESULTS: We measure the extent to which each set of confidence scores correlates with similarity of the interacting proteins in terms of function, expression, pattern of sequence conservation, and homology to interacting proteins in other species. We also employ a new metric, the Signal-to-Noise Ratio of protein complexes embedded in each network, to assess the power of the different methods. Seven confidence assignment schemes, including those of Bader et al., Deane et al., Deng et al., Sharan et al., and Qi et al., are compared in this work. CONCLUSION: Although the performance of each assignment scheme varies depending on the particular metric used for assessment, we observe that Deng et al. yields the best performance overall (in three out of four viable measures). Importantly, we also find that utilizing any of the probability assignment schemes is always more beneficial than assuming all observed interactions to be true or equally likely
Comparative Analysis of Normalization Methods for Network Propagation
Network propagation is a central tool in biological research. While a number of variants and normalizations have been proposed for this method, each has its own shortcomings and no large scale assessment of those variants is available. Here we propose a novel normalization method for network propagation that is based on evaluating the propagation results against those obtained on randomized networks that preserve node degrees. In this way, our method overcomes potential biases of previous methods. We evaluate its performance on multiple large scale datasets and find that it compares favorably to previous approaches in diverse gene prioritization tasks. We further demonstrate its utility on a focused dataset of telomere length maintenance in yeast. The normalization method is available at http://anat.cs.tau.ac.il/WebPropagate
- …